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Abstract. We present an algorithm for evaluating a linear "intersection transform" of a 
function defined on the lattice of subsets of an n-element set. In particular, the algorithm 
constructs an arithmetic circuit for evaluating the transform in "down-closure time" rel- 
ative to the support of the function and the evaluation domain. As an apphcation, we 
develop an algorithm that, given as input a digraph with n vertices and bounded integer 
weights at the edges, counts paths by weight and given length 0<^<n— lin time 
0*(exp(n ■ H{£/{2n)))), where H{p) = —p logp — (1 — p) log(l — p), and the notation O* (•) 
suppresses a factor polynomial in n. 



1. Introduction 

Efficient algorithms for linear transformations, such as the fast Fourier transform of 
Cooley and Tukey [TOl and Yates' algorithm [28], are fundamental tools both in computing 
theory and in practical applications. Therefore it is surprising that some arguably elemen- 
tary transformations have apparently not been investigated from an algorithmic perspective. 

This paper contributes by studying an "intersection transform" of functions defined 
on subsets of a ground set. In precise terms, let U he a finite set with n elements (the 
ground set), let i? be a ring, and denote by 2^ the set of all subsets of U. The intersection 
transform maps a function / : 2^ ^ i? to the function fi : {0, 1, . . . , n} x 2^ ^ i2, defined 
for all j = 0, 1, . . . , n and Y QU hy 

fi,{Y)= /w- (1-1) 

XQU 

\xnY\=j 
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Our interest here is in particular to restrict (or "trim" ) the domains of the input / and 
the output fi from 2^ to given subsets of 2^. 

For a subset 3' Q 2^, denote by [3^ the down-closure of 3", that is, the family of 
sets consisting of all the sets in 3" and their subsets. The notation 0*(-) in what follows 
suppresses a factor polynomial in n. The following theorem states our main result. 

Theorem 1. There exists an algorithm that, given 3^ C. 2^ and S C 2^ as input, in time 
0*(|J,3"| + |iS|) constructs an i2-arithmetic circuit with input gates for / : 3" ^ i? and 
output gates that evaluate to /i : {0, 1, n} x S ^ ii. 

This result supplies yet another tool aimed at the resolution of a long-standing open 
problem, namely that of improving upon the classical (early 1960s) dynamic programming 
algorithm for the Travelling Salesman Problem (TSP). With an 0*(2") running time for an 
instance with n cities, the classical algorithm, due to Bellman [31 S], and, independently. 
Held and Karp [TB], remains the fastest known exact algorithm for the TSP. Moreover, 
progress has been equally stuck at 0*(2"') even if one considers the more restricted Hamil- 
tonian Path (HP) and the Hamiltonian Cycle (HC) problems. 

Armed with Theorem [T] we show that the 0*(2") bound can be broken in a counting 
context, assuming one cares only for long paths or cycles, as opposed to the spanning paths 



or cycles required by the TSP/HP/HC. (See ^1.1 for a contrast with earlier work.) 
Denote by H the binary entropy function 

H{p) = —plogp — {1 — p)log{l — p), < p < 1. (1.2) 

Theorem 2. There exists an algorithm that, given as input 

(i) a directed graph D with n vertices and bounded integer weights at the edges, 

(ii) two vertices, s and t, and 

(iii) a length ^ = 0, 1, . . . , n — 1, 

counts, by total weight, the number of paths of length i from s to t in D in time 

For example. Theorem |2] implies that we can count in 0(1.7548") time with length 
£ = 0.5n and in 0(1.999999999") time with length i = 0.9999n. For length ^ = n - 1 the 
bound reduces to the classical bound 0*(2"'). 

We observe that counting implies, by self-reducibility, that we can construct examples 
of the paths within the same time bound. Similarly, we can count cycles of a given length 
within the same bound. However, the efficient listing (in the form of vertex supports, 
weights, and ends s,t) of all the paths for any length i ^ n/2 appears not to be possible 
with present tools in 0((2 — e)") time for e > independent of n. Indeed, if it were possible, 
we would obtain the breakthrough 0((2 — e)") algorithm for generic TSP by starting the 
classical algorithm from the output of the listing algorithm. 

We expect Theorem [T] to have applications beyond Theorem [2j for example, in the 
context of subset query problems discussed by Charikar, Indyk, and Panigrahy [8 . 

Given C 2^ and 9 ^ 2^ as input, we can count in 0*(||3"| + |iS|) time for each 
Y £ 3 the number of X G 3" that intersect y in a given number of points; in particular, for 
each Y we can count the number of disjoint X. 
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By duality of disjointness and set inclusion, we can thus count in 0*(|J,IF| + |tS|) time 
for each Y € Q the number of X € S' with X C. Y. Here IS denotes the up- closure of S, 
that is, the family of sets consisting of all the sets in S and their supersets in 2^. 

1.1. Further remarks and earlier work 

Theorem [T] has its roots in Yates' algorithm |28] for evaluating the product of a vector 
with the n}^ Kronecker power of a 2 x 2 matrix. While Yates' algorithm is essentially optimal, 
running in 0*(2") ring operations given an input vector with 2" entries, in certain cases 
the evaluation can be "trimmed" , assuming one requires only sporadic entries of the output 
vector. In particular, the present authors have observed [6| that the zeta and Moebius 
transforms on 2^ are amenable to trimming (see Lemma [s] below for a precise statement). 

The proof of Theorem [T] relies on a trimmed concatenation of two "dual" zeta trans- 
forms, one that depends on supersets of a set (the "up" transform), and one that depends 
on subsets of a set (the "down" transform). To provide a rough intuition, we first use 
the up-zeta transform to drive information about / on 5" "down" to J,3". Then we use a 
"ranked" [5] down-zeta transform to assemble information "up" from |S to 9- Finally, 
we extract the intersection transform from the information gathered at each y G S. This 
essentially amounts to solving a fixed system of i2-linear equations at each Y £ S- 

This proof strategy yet again highlights a basic theme: the use of fast linear transforma- 
tions to distribute and assemble information across a domain (e.g. time, frequency, subset 
lattice) so that "local" computations in the domain (e.g. pointwise multiplication, solving 
local systems of linear equations) alternated with transforms enable the extraction of a de- 
sired result (e.g. convolution, intersection transform). Compared with earlier works such as 
[SI ini [19], the present approach establishes the serendipity of the up/down dual transforms 
and introduces the "linear equation trick" into the toolbox of local computations. 

Once Theorem [T] is available. Theorem [2] stems from the observation that a path can be 
decomposed into two paths, each having half the length of the original path, with exactly 
one vertex in common. Theorem [T] then enables us to "glue halves" in 9^ and S, where 
13' and J, 9 consist of sets of size at most [■^/2] -|- 1. This prompts the observation that 
Theorem 1 is useful only when the bound 0*(|J,?'| -|- |i9|) improves upon the trivial bound 
O* (|9"| 1 9|j obtained by a direct iteration over all pairs {X,Y) G 3" x 9- 

We know at least one alternative way of proving Theorem [2] without using Theorem [T} 
Indeed, assuming knowledge of trimming pi, one can use an algorithm of Kennes [10] 
to evaluate a sum '^\z\=j'^xnY=z f(-^)90^) given f : 3' ^ R and : 9 — > in 
0*(|i9"| -|- |i9|) ring operations (take the trimmed up-zeta transform of / and g, take 
pointwise product of transforms, take the trimmed up- Moebius transform, and sum over all 



j-subsets in [3 U J,9)- This enables one to evaluate the right-hand side of (3.8) below in 
time (1.3), thus giving an alternative proof of Theorem [2] 



To contrast Kennes' algorithm with Theorem [T| Kennes' algorithm computes for each 



Z C [7 the sum over pairs {X,Y) G 3"x 9 with Z = XDY, whereas (1.1) computes, for each 



y G 9 the sum over X G 3" with |Xny| = j. Thus, Kennes' algorithm provides control over 



the intersection Z but lacks control over the pairs {X,Y), whereas (1.1) provides control 
over Y but lacks control over the intersection (except for size). 

As regards the TSP/HP/HC, earlier work on exact exponential-time algorithms can be 
divided roughly into three lines of study. (For a broader treatment of TSP/HP/HC and 
exact exponential-time algorithms, we refer to Pl I14t 1^. and |27], respectively.) 
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One line of study has been to restrict the input graph, whereby a natural restriction 
is to place an upper bound A on the degrees of the vertices. Eppstein [11] has developed 
an algorithm that runs in time 0*(2"/3) = 0(1.260") for A = 3 and in time 0(1.890") 
for A = 4. Iwama and Nakashima [16] have improved the A = 3 case to 0(1.251"), and 
Gebauer [12] the A = 4 case to 0(1.733"). The present authors established [7J an 0((2-e)") 
bound for all A, with e > depending on A but not on n. 

A second line of study has been to ease the space requirements of the algorithms from ex- 
ponential to polynomial in n. Karp [18] and, independently, Kohn, Gottlieb, and Kohn |20j 
have shown that TSP with bounded integer weights can be solved in time 0*(2") and space 
polynomial in n. Combined with restrictions on the graph, one can arrive at running times 
O* ((2 - e)") and polynomial space [T] ttU [E]. 

A third line of study relaxes the requirement on spanning paths/cycles to "long" 
paths/cycles. In this setting, a simple backtrack algorithm finds a path of length i in 
time 0*(n^). Monien [21] observed that this can be expedited to 0*{£\) time by a dynamic 
programming approach. Alon, Yuster, and Zwick [1] introduced a seminal colour-coding pro- 
cedure and improved the running time to 0*((2e)^) expected and 0*(c^) deterministic time, 
c a large constant. Subsequently, combining colour-coding ideas with a divide-and-conquer 
approach, Chen, Lu, Sze, and Zhang j9j, and, independently, Kneis, Molle, Richter, and 
Rossmanith [22], developed algorithms with 0*(4^) expected and O* (4^"*"°*^^) ) deterministic 
time. A completely different approach was taken by Koutis [2T], who presented an 0*(2^^/^) 
expected time algorithm relying on a randomised technique for detecting whether a given n- 
variate polynomial, represented as an arithmetic circuit with only sum and product gates, 
has a square- free monomial of degree i with an odd coefficient. Recently, Williams |26j 
extended Koutis' technique and obtained an 0*(2^) expected time algorithm. 

To contrast with Theorem [2| while the 0*(2'^) bound of the Koutis- WilUams [211 126] 
algorithm is superior to the bound (1.3) in Theorem |2] it is not immediate whether the 



Koutis-Williams approach extends to counting problems. Furthermore, it appears chal- 
lenging to derandomise the Koutis-Williams algorithm without increasing the running time 
(see \26\ p. 6]), whereas the algorithm in Theorem [2] is deterministic. 

2. The fast intersection transform 
2.1. Preliminaries 

For a logical proposition P, we use Iverson's bracket notation [P] to denote a 1 if P is 
true, and a if P is false. 

Let 3" C 2^ and / : 3" ^ P. 

Define the up-zeta transform fC} for all y C [/ by 



fCHY)= ^ f{X). (2.1) 



Define the down-zeta transform fQ^ for all y C [/ by 

fC^{Y)= f{X). (2.2) 



XCY 
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The following lemma condenses the essential properties of the "trimmed" fast zeta 
transform [6j. 

Lemma 3. There exist algorithms that construct, given 3" C 2^ and S as input, an 

i?-arithmetic circuit with input gates for f : "J ^ R and output gates that evaluate to 



(1) fC 

(2) fC 

(3) /C^ 

(4) /C^ 



i?, with construction time 0*(|3"| + |tS|) 
i?, with construction time 0*(||3"| + |S|) 
R, with construction time O* + |iS|) 
R, with construction time 0*(|t3"| + |S|)- 



and 



2.2. The inverse of truncated Pascal's triangle 

We work with the standard extension of the binomial coefficients to arbitrary integers 
(see Graham, Knuth, and Patashnik [IS])- For integers p and g, we let 

1 if g = 0; (2.3) 

ifg<0. 
The following lemma is folklore, but we recall a proof here for convenience of exposition. 
Lemma 4. The integer matrices A and B with entries 

6i, = (-l)^+^Q, z,j = 0,l,...,n (2.4) 

are mutual inverses. 

Proof. Let us first consider the (i, j)-entry of AB: 



n n 
fc=0 fc=0 



k=i 



Here the second equality follows by observing that j > implies (^) = for all k > j; 
similarly, k > implies (^) = for all < A; < i. The third equality follows from an 
appHcation of the identity (^ (^) = (^) (^I^) , vaUd for all integers p, q, r (see |13| Equation 
5.21]). The last equality follows from an application of the Binomial Theorem. 



6 



The analysis for the (i, j)-entry of BA is similar: 

k=0 k=0 ^ ^ ^ ^ 



■ 

It follows from Lemma[4]that the matrices A and B are mutual inverses over an arbitrary 
ring R, where the entries of the matrices are understood to be embedded into R via the 
natural ring homomorphism z i— > z/j = z-Ir, where 1/j is the multiplicative identity element 
of R, and z is an integer. 

2.3. Proof of Theorem [U 

We first describe the algorithm and then prove its correctness. All arithmetic in the 
evaluations, and all derivations in subsequent proofs, are carried out in the ring R. 

Let 3^ C 2^ and S C 2^ be given as input to the algorithm. The circuit is a sequence 
of three "modules" starting at the input gates for f : 3' ^ R. 

1. Up-transform. Evaluate the up-zeta transform 

g = on i J (2.5) 
with a circuit of size O* using Lemmajsjl^l). Observe that (2.1 ) implies that all nonzero 



values of fC} are in 

2. Down-transform by rank. For each i = 0, 1, . . . , n, evaluate g^'^\ the component of g 
with rank i, on that is, for all X G set 

/■'m = |f'' ''i-^i = '^ (2.6) 

I (J otherwise. 

Then, for each i = 0, 1, . . . , n, evaluate 

Vi = 5«C^ on g (2.7) 

with a circuit of size 0*(||3'| + |iS|) using Lemma p[3). 

3. Recover the intersection transform. Let Br be the matrix in Lemma |4] with entries 
embedded to R. Associate with each y G S the column vector 

y{Y) = {y^{Y),y^{Y),...,yn{Y)f. 
For each y G S, evaluate the column vector 

x{Y) = {xQ{Y),xi{Y),...,Xn{Y)f 
as the matrix-vector product 

x{Y) = BRy{Y). (2.8) 
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Because the matrix Bji is fixed, this can be implemented with 0*(|S|) fixed i?-arithmetic 
gates. 

The circuit thus consists of 0*(||?'| + ||S|) i?- arithmetic gates. It remains to show that 
the circuit actually evaluates the intersection transform of /. 

Lemma 5. For all y G S and j = 0, 1, . . . , n it holds that Xj{Y) = fij{Y). 

Proof. Let Y G S and i = 0, 1, . . . , n. Consider the following derivation: 

zcY xe?" 

\Z\=iZCX 

= E/w E i« 

xe?" zcxnY 

\Z\=i 

^ )r 



E 



(IS) 



E 

j=0 



E /(^) 

|xny|=i 



^{aij)^fij{Y). 

j=0 



Here the first equality expands the definitions (2.7), (2.2), (2.6), (2.5), and (2.1). The 
second equality follows by changing the order of summation and observing that Z <^ X DY 
if and only if both Z C. X and Z <ZY. The fourth equality follows by collecting the terms 
with |X n y| = j together. The last equality follows from (2.4) and (1.1). 

Now let J = 0, 1, . . . ,n, and observe that (2.8), (2.9), and Lemma |4| imply 

n 

i=0 
n n 



^{^ji)R^Mjifik{Y) 

i=0 k=0 
n y n \ 

E ElMnMii A^'W 

fe=0^i=0 ^ 
n y n \ 

E [Yli^ii'^iA f'^k{Y) 

n 

Eb' = ^]ii^'^(^) 



k=0 
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3. Counting paths 

3.1. Preliminaries 

We require some preliminaries before proceeding witli tlie proof of Tlieorem [2] For 
basic grapli-tlieoretic terminology we refer to West |25]. 

Let D be an n-vertex digraph with vertex set V and edge set possibly with loops 
and parallel edges. (However, to avoid further technicalities in the bound ( |1.3| ), we assume 
that the number of edges in D is bounded from above by a polynomial in n.) Associated 
with each edge e G S is a weight w{e) £ {0,1,...}. For an edge e £ E, denote by e~ 
(respectively, e"*") the start vertex (respectively, the end vertex) of e. 

It is convenient to work with the terminology of walks instead of paths. A walk of 
length ^ in Z) is a tuple W = {vq, ei,vi, 62, V2, . • • , w^-i, e^, vi) such that vo,vi, . . . ,Vi G V, 
ei, 62, . . • , e£ S E, and, for each i = 1, 2, . . . , it holds that e~ = Vi-i and ef = Vi. The 
walk W is said to be from vq to vi. 

A walk is simple if fo, f 1, . . . , ve are distinct vertices. The set of distinct vertices occur- 
ring in a walk is the support of the walk. We denote the support of a walk W by supp(VF). 
The weight of a walk W is the sum of the weights of the edges in the walk; a walk with no 
edges has zero weight. We write w(W) for the weight of W. 

For s,t € V and S V we denote by Ws^t{S) the set of all simple walks from s to t 
with support S. Observe that Wg^tiS) is empty unless both s £ S and t £ S. 

Let 2; be a polynomial indeterminate, and define an associated polynomial generating 
function by 

fsAS)= E (3.1) 

vyeWs,t(5) 

Put otherwise, the coefficient of each monomial of fs,t{S) enumerates the simple walks 
from s to t with support S and weight w. 

For /c = 0, 1, . . . , n, denote by (^) the set of all /c-subsets of V. 

For i = 0,1, . . . ,n — 1, define a polynomial generating function by 

gs,t{^)= E /^(■^)- (3-2) 

Put otherwise, the coefficient of each monomial of gs,t{^) enumerates the simple walks 
from s to t with length i and weight w. 

3.2. Proof of Theorem [2] 

Let B G {0,1,...} be fixed. Let D be a digraph with n vertices and edge weights 
w{e) G {0, 1, . . . , 5} for all e£ E. Let s, t G F. Let ^ = 0, 1, . . . , n - 1. 

With the objective of eventually applying Theorem [T| let f7 = y and let R be the 
univariate polynomial ring over z with integer coefficients. 

To compute gs^t), proceed as follows. First observe that the generating polynomials 
( |3.1[ ) can be computed by the following recursion on subsets of V . The singleton sets 
{s} , s £V , form the base case of the recursion: 

/s,s({5})=l. (3.3) 
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The recursive step is defined for all s,t G V and S* C 15*1 > 2, by 



(3.4) 



aeS\{t} 



■ ee-E 
e~=a 
e+=t 



Now, using (3.3) and (3.4), evaluate 



Ps,a = fs,a on 



.L^/2j+l) 



for each a gV. Then, using (3.3) and (3.4) again, evaluate 



qa,t = fa,t on 



/■ V 



Next, using the algorithm in Theorem jlj with 3" = and 9 = ([^/2j+i)i evaluate 

ra,t = qa,tii on 
Finally, evaluate the right-hand side of 

9sA^) = E E Ps,a{S)raAS) 

by direct summation. 

The entire evaluation can thus be carried out with an i?-arithmetic circuit of size 



(3.5) 

(3.6) 

e 

(3.7) 
(3.8) 



^*(IUr£/2i+i)l ^ IUi^/2j+i)l) 



(3.9) 



that can be constructed in similar time. 

To justify the equality in (|3.8|), consider the following derivation: 



E E Ps,a{S)ra,t{S) 



E E z^.-^^^) E /'^.*(^) 

■"=vl«/2j+iJ -"=vr'^/2i+i^ 
|SnT|=i 

E E E E E ^ 

'^^^ ^e(L./rj+i) Mi^m+i) ^saev^sAS) W'..eW.,(T) 
|5nT|=i 

^ ^ ^ ^ ^ ^w{Wsa)+w{Wat) 

'^^^^e(L,/-+i)re(r,/rHi)^-e^-'"(^)^-eW-*(^) 

SnT={a} 



Here the first two equalities expand (3.5), (3.7), (1.1), (3.6), and (3.1). The third equality 
follows by observing that V^s,a{S) and 'Wa_t(r) are both nonempty only if a G S and a £ T. 
Thus, 15 n r| = 1 implies that only terms with S CiT = {a} appear in the sum. The fourth 
equality is justified as follows. First observe that an arbitrary walk W of length i from s to 
t has the property that there exists a J G (^+i) with supp(l^) = J if and only if the walk 
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is simple. Moreover, a simple walk W of length £ from s to t has a bijective decomposition 
W 1-^ (Wsa, Wat) into two simple subwalks, Wsa and Wat, with supp(VFsa)nsupp(VFat) = {o} 
for some a & V. Indeed, Wsa is the length- [^/2j prefix of W from s to some a G V, and VFat 
is the length- [£/2] suffix of W from a to t. Conversely, prepend Wsa to PVati deleting one 
occurrence of a in the process, to get W. The fifth equality follows from (3.2) and (3.1). 

It remains to analyse the total running time of constructing and evaluating the circuit 
in terms of n and i. 

Because B is fixed, all the ring operations are carried out on polynomials of degree at 
most Bn = 0{n). Moreover, denoting by m the number of edges in Z), the coefficients in 
the polynomials are integers bounded in absolute value by 2™2^", where 2™ is an upper 
bound for the coefficients in (3.1 ) and (3.2 ), and 2^" is an upper bound for the expansion in 
intermediate values in the transforms. (Both bounds are far from tight.) Recalling that we 
assume that m is bounded from above by a polynomial in n, we have that the coefficients 
can be represented using a number of bits that is bounded from above by a polynomial in 
n. It follows that each ring operation runs in time bounded from above by a polynomial in 
n. 

To conclude that the algorithm runs within the claimed upper bound (1.3), combine 



(3.9) with the observation that for every < p < 1/2 it holds that 

L"pJ 

< exp(i/(p) • n) 



E 

fe=0 



(3.10) 



where H is the binary entropy function (1.2). (For aproof of (3.10), see Jukna p. 283].) 
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